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Abstract 

We introduce a novel tracking technique which uses dynamic confidence-based fusion of two different 
information sources for robust and efficient tracking of visual objects. Mean-shift tracking is a popular and 
well known method used in object tracking problems. Originally, the algorithm uses a similarity measure 
which is optimized by shifting a search area to the center of a generated "weight image" to track objects. 
Recent improvements on the original mean-shift algorithm involves using a classifier that differentiates the 
object from its surroundings. We adopt this classifier-based approach and propose an application of a 
classifier fusion technique within this classifier-based context in this work. We use two different classifiers, 
where one comes from a background modeling method, to generate the weight image and we calculate 
contributions of the classifiers dynamically using their confidences to generate a final weight image to be 
used in tracking. The contributions of the classifiers are calculated by using correlations between histograms 
of their weight images and histogram of a defined ideal weight image in the previous frame. We show with 
experiments that our dynamic combination scheme selects good contributions for classifiers for different 
cases and improves tracking accuracy significantly. 
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1. Introduction 

Object tracking in image sequences is an im- 
portant problem in computer vision applications. 
Mean-shift tracking is a popular techique used for 
object tracking which models the color histogram of 
the tracked object in a frame and tries to shift the 
tracking window to a neighborhood area in the next 
frame, histogram of which is most similar to the 
modeled one. Although the original method relies 
on an iterative optimization of a similarity measure, 
the interpretation of the optimization target -called 
the weight image- has allowed extended techniques 
to be developed. Classifier based mean-shift track- 
ing [l| is such an extension, where the weight im- 
age is generated by a classifier system which aims to 
perform binary classification for "object" and "out- 
side" pixels in a local search window. 
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Another interesting problem considering moving 
objects is the background modeling problem. One 
of the popular techniques that is used to discrim- 
inate moving objects from a constant background 
in a long sequence of images is Stauffer-Grimson 
method [2| which relies on modeling the background 
pixels with mixtures of Gaussians. After a feasible 
time, generated background models for each pixel 
can be used to find the similarity of the pixel to 
the background at any time. One of the advan- 
tages of this method is its quick modeling of the 
background (usually ten frames) which allows even 
non-fixed cameras to benefit from this method dur- 
ing when they are temporarily fixed. 

Handling the tracking problem within a classifier- 
based framework enables principled fusion of classi- 
fiers to be applied to the problem. In this work we 
combine two kinds of classifier approaches to gen- 
erate a classifier combination system that is used 
to track a desired object over time. First kind of 
classifier is a binary classifier that differentiates the 
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tracked object from its surroundings. This kind of 
classifier uses the same logic with any classifier used 
in conventional classifier based mean-shift trackers 
which are actually binary classifiers that are trained 
on pixels that belong to the tracked object and its 
surrounding area in the local search window in the 
previous frame or frames. This classifier generates 
posterior probabilities of belonging to the tracked 
object for each pixel in the current frame. 

The second kind of classifier is derived from the 
background model of the Stauffer-Grimson method, 
where we derive posterior probability of belonging 
to any foreground object for each pixel. Then with 
a novel proposed scheme, we dynamically calculate 
confidence values (i.e. weights) for each classifier 
and find a weighted combination of these classifiers 
to use in mean-shift tracking. 

As mentioned above and reviewed on following 
sections, there exist previous works that rely on 
binary classifiers to generate weight images [l| or 
background models ^ used for tracking. However, 
in our work, we investigate disadvantages of using 
separate approaches and give examples of situations 
that individual methods fail. So we propose our 
novel approach, which combines both approaches 
and benefits from advantages of both. We propose 
a dynamic scheme that gives weights to individ- 
ual classifiers and generates weighted combinations 
of them. Our dynamic weighting scheme can se- 
lect proper weights for any situation since it uses 
a well-defined metric for classifier confidence. This 
combination scheme is a novel approach that leads 
to superior tracking performance. 

This paper is organized as follows; in Section [2] 
we give a brief literature review of classifier based 
mean-shift tracking approach and in Section [3] we 
similarly review Stauffer-Grimson method and how 
it can be used to generate posterior probabilities for 
foreground objects. Section 2] contains information 
about our proposed combination scheme and finally 
in Section [5] results are presented as well with dis- 
cussions and comments. 

2. Classifier Based Mean-Shift Tracking 

Mean-shift has been originally proposed for esti- 
mating the gradient of a density function [3| and has 
been used for feature space analysis [J| . After being 
introduced [5[[6| into computer vision literature as 
an object tracking technique, mean-shift tracking 
has been a well known and referenced method for 
tracking non-rigid objects. 




Figure 1: A sample weight image that gives much higher 
values to object pixels with respect to surrounding area 



The original mean-shift tracking relies on mod- 
eling the grayscale histogram of the tracked ob- 
ject, which is called the target model {q{b) : b £ 
{l,..,B}). Then on the next frame, the histogram 
of the search window (target candidate; p{b)) is gen- 
erated and a similarity measure (Bhattacharyya co- 
cfficent) between the target model and candidate is 
defined: 



BCip,q) = J2Vq(b)m- 



(1) 



To track the object, the search window should 
be iteratively shifted towards the direction where 
this similarity is at maximum. The direction and 
magnitude of the shift vector is calculated by opti- 
mizing the similarity measure which yields a weight 
value for each pixel which is calculated as: 



W, 






(2) 



Here b{Xi) gives the bin b [b £ {1, .., B}) for the 
feature vector (which is simply the grayscale value 
of the pixel in the originial work) Xi of pixel i. 
Equation ([2]) can be interpreted as a weight such 
that higher values are assigned to grayscale lev- 
els that are more frequent in the target model but 
less frequent in the target candidate. The set of 
WiS form the so-called weight image. After obtain- 
ing weight values of each pixel, the center of the 
search window is shifted to the center of weight of 
the weight image. This procedure is repeated af- 
ter the first shift until there is no shift an example 
of which can be seen in Figure [21 Since tracking 
process shifts the search window towards the cen- 
ter of weight, the more the object is separated from 
the surrounding with higher values, the better the 
weight image is. A sample weight image can be seen 
in Figure [TJ 

The above interpretation of weight values has 
given rise to different approaches for generating the 
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Figure 2: Sample iterations for a mean-shift tracking process, showing the calculation of the search window in a new frame 



weight image. One such approach [7| uses Red- 
Green-Blue (RGB) channels and their 49 different 
linear combinations for the frame. Discrimination 
of classes (i.e. object and outside) of pixels in the 
search window is calculated for each combination 
and top 5 of them are selected for the final tracking 
decision. The discrimination for each feature in the 
work is calculated as a measure of variance ratio of 
log-likelihoods of the class histograms which yields 
higher values for features where the two classes are 
better separated in the histogram such that; within 
class variances are low and overall variance is high. 
This interpretation is very similar to Fisher discrim- 
inant used for linear discriminant analysis. 

Another similar work [8[ again uses the better 
separating linear combinations of RGB channels 
where the calculation of separation relies on a mea- 
sure of Bayes error of the class histograms. Bayes 
error exposes discriminative power better in mul- 
timodal distributions whereas variance ratio relies 
on separation of modes of distributions. This er- 
ror rate yields lower values where two classes share 
less pixels, which means a separation of classes 
in the histogram. Derivation of Bayes error from 
the Bayesian decision theory relates the work to 
Bayesian classifiers. 

Since the approaches rely on mappings that takes 
pixel features (either directly grayscale or linear 
combinations of RGB channels) as inputs and out- 



puts weight values, a good classifier that discrim- 
inates two classes of object and outside can be 
used for generating these output weight values. A 
detailed analysis about classifier based mean-shift 
tracking is presented in the work [l| where weight 
values are generated by classifier ensembles. In this 
work, an ensemble of weak classifiers that distin- 
guish object pixels from outer pixels are trained at 
each step. During training, each pixel contributes 
as a pair {Xi,Ci}, where Xi is the feature vector 
(such as appearance features like grayscale inten- 
sity or RGB color values, texture features like Ga- 
bor coefficients, gradient histograms or local binary 
patterns) for pixel i and belongs to d-dimensional 
input feature space W^. Ci denotes the label of the 
pixel as Ci e { — !,+!}. The weak classifiers h{X) 
map input pixels to labels such that: 



hiX) 



{-1,+1}- 



(3) 



h{X) classifiers try to separate the two classes 
in R'*. Most classifiers can output a score (poste- 
rior probability) c{X) G [0, 1] where it takes higher 
values if the pixel is more likely to belong to the 
tracked object. The score values of each pixel is 
then used like a weight image W^ — c{X.i) and 
mean-shift tracking is applied. In [l| , not only color 
values of pixels but also texture values calculated 
from local gradients are used as features for clas- 
sifiers. Although we only use color values in this 




Figure 3: A sample search window composed of object 
(green) and outer (red) areas. Classifier is trained using the 
pixel in those areas and on next frame in the same search 
window posterior probability values are calculated for each 
pixel 



work, the explicit modeling of the object /outer clas- 
sifier based weight image generation is the baseline 
for first kind of classifiers that we use in our classi- 
fier combination system. In this work we define the 
outer area such that it surrounds the object area, 
and together they form the larger search window, 
so bounds of the search window in the following 
tracking process are the bounds of the outer area 
as shown in Figure [31 We train our binary classi- 
fier using the pixels in the object area by assigning 
them class label -1-1 and the pixels in the outer area 
by assigning them class label —1. We train the clas- 
sifier using the pixels of the frame t (after tracking 
for that frame is completed) and apply them on the 
pixels within the same search window to get poste- 
rior probability values for each pixel at frame i -|- 1. 
However, the center of weight calculation is still 
done in the smaller object window which is shifted 
during mean-shift iterations to obtain its new loca- 
tion in frame t + 1. In contrast with the original 
mean-shift algorithm, the weight image is not up- 
dated after each mean-shift iteration since there is 
no histogram calculation, instead the weight image 
calculated in the larger search window is used di- 
rectly. 

The normalized values of the score values in [0, 1] 
range allows us to interpret them as posterior prob- 
ability values. Let us denote C as a discrete random 
variable taking values { — 1, +1} such that Ci = +1 
if the pixel i belongs to the object and Ci — —1 if it 
belongs to the outer area. Then classification score 
of a pixel can be seen as the posterior probability 



of belonging to a moving object for that pixel: 

p(C, = +l|XO = c(X,), (4) 

and for the same pixel the probability of belong- 
ing to outer area is then given as the complement 
of that: 



p{a = ~1\X,) = 1 - c{X,). (5) 

Using classifiers to generate the weight image 
helps to handle the tracking problem in a classifier 
based framework. As mentioned in the introduction 
this allows to apply classifier specific techniques to 
the problem like classifier fusion as proposed in this 
work. Although training a classifier may give the 
impression to add an extra computational load to 
the process, the advantage of using classifiers is that 
a classifier is trained and used to generate the pixel 
weight values only once for a frame, whereas on the 
original mean shift tracking weight values are re- 
calculated at each iteration. Also using classifiers 
avoids using multi-dimensional feature histograms 
as shown in Equations ([T]) and 1^ and easily allows 
using higher dimensional features. 

In the original mean-shift algorithm, kernels can 
be used to emphasize the effect of central pixels in 
the tracking window by increased weighting of their 
contribution in the object histogram. This idea can 
also be used during classifier training by taking mul- 
tiple samples from central pixels or by weighting 
central pixels more using classification algorithms 
that can work with weighted training data. How- 
ever in this paper, we do not perform any kernel 
weighting since it did not result in improved results. 

3. Background Modeling and Object Classi- 
fier 

The background modeling proposed by Stauffer- 
Grimson [2], handles each pixel as a three dimen- 
sional (RGB) random vector X. The vectors are 
supposed to be generated by a mixture of Gaussians 
where basically each pixel's RGB values are gener- 
ated by an individual mixture. Gaussians allow to 
capture the deviations for a background object and 
since a pixel may contain different background ob- 
jects more than one Gaussian is used. For example 
in a windy scene where a tree is observed, the leaves 
of the tree may move and in some pixels both the 
leave and the background sky is observed over time. 
Although the observed value of the pixel changes 
between two group of values over time, the values 



are generated by fixed objects (i.e. tree and sky). 
This leads to the necessity of using more than one 
Gaussian to model the background of the pixel. The 
Gaussian mixture model (GMM) defines the likeli- 
hood of the random vector X using a weighted sum 
of K number of Gaussians: 



K 



p(X) = V^,A/-(X,M,,S,). 



i=i 



(6) 



Parameters of the distributions (^j and Sj ) are 
initialized with fixed values and updated dynam- 
ically [2| using the observations over time. Sim- 
ilarly weight of each Gaussian component {wj) is 
also updated over time, such that weights for fre- 
quently observed mixtures are set to higher values. 
In any time, there exists a Gaussian mixture model 
for each pixel which can be used to infer the like- 
lihood of belonging to the background for the cur- 
rent observation. This likelihood can be calculated 
using Equation ^, however not always all of the 
Gaussians in the mixtures is used, because some 
of the Gaussians in the mixtures may be generated 
by temporary foreground objects. The distinction 
between foreground and background Gaussians is 
determined by their weights such that weights are 
ordered from highest to lowest and first B Gaus- 
sians cumulative sum of which are lower than a fixed 
threshold are taken as belonging to the background. 

This distinction may be used for a binary classi- 
fication as well as calculating the likelihood of be- 
longing to background for the current observation 
of the pixel. The likelihood that the pixel i is gen- 
erated by the background {BG) is equal to the like- 
lihood of the feature vector of the pixel under the 
mixture that contains only the background Gaus- 



p{C^ 



-i\x, 



p{X,\C, = -l)p{C, = -l) 



Pix,,a 



-1 



-p(X„C, = +!)■ 
(8) 
Here, Ci is the same random variable, show- 
ing the object/background belonging of the pixel 
i, which is introduced in Equations (|4|) and ([5]). 
p{Ci = —1) is the prior probability of belong- 
ing to background for the pixel at any time and 
p{Ci = +1) is its complement; 1 — p{Ci — —1). 
Since p{Ci = —l\Xi) is the probability that ob- 
served pixel value belongs to background, its com- 
plement gives the probability that the observed 
pixel value belongs to non-background-thus any 
moving object: 

pia^+i\x,) = i-p{a = -i\x,). (9) 

This probability value in Equation ^ is higher 
for the pixel values that are less likely to be gener- 
ated by the past background structure of that pixel 
which we use as the second type of classifier that we 
use in our classifier combination system. Although 
class label -1-1 and probability value in Equation ^ 
in this context refers to any moving object rather 
than only tracked object, it is appropriate to use it 
since it gives a good discrimination between fixed 
background and tracked object — which is usually 
not fixed. Since we are interested in the pixels that 
are within the search window which is defined in 
Section [2] we use probability values only for those 
pixels within the search window. However parame- 
ters of the relevant background model for each pixel 
are updated on the whole image. Our idea is to use 
these posterior probabilities as another weight im- 
age W^ — p{Ci ~ +l\Xi) to aid in our me an- shift 
tracking algorithm. 



p(X,\zeBG)^J2(^^W{X,^,,E,). (7) 

k=l 

Since this method models the background over 
time, it can adapt to changes and update the back- 
ground model with respect to new observations. 
This is different from and superior to training sepa- 
rate models for background and foreground, since a 
pixel that is classified as foreground with this model 
may blend into background in time. 

Equation ([7|) can be used to calculate probability 
that the observed pixel i and its feature vector Xi 
belongs to the background using the Bayes rule [9| : 



4. Classifier Combination for Mean-Shift 
Tracking 

Combination of inputs of several sensors or com- 
binations of outputs of different processings of the 
same sensor can be used to improve tracking ac- 
curacy. Like [7[ which calculates a sum of differ- 
ent mean-shift vectors calculated from different fea- 
tures there exists methods in the literature like; 
feature selection by AdaBoost [lO|, combination 
of classifier outputs under a linear programming 
framework |ll|. combination of linear support vec- 
tor machines [13 , random forest classifiers [131 and 



combination of thcrmo-visual and regular camera 
images [14| to perform tracking. 

As mentioned in Section [2J a classifier that per- 
forms binary (object and outer) classification by 
learning from samples of both classes can be trained 
and used to calculate probability of belonging to 
tracked object for each pixel in the search window. 
The classifier mentioned in Section [3] learns how to 
model the background using past data and infers 
the probability of a pixel belonging to foreground 
object. In our opinion distinct and heterogeneous 
nature of these two classifiers have complementary 
properties and using both of them is the most suit- 
able approach. 

When the tracked object passes near a fixed ob- 
ject that has similar appearance, the classifier that 
tries to separate the object and the outer area may 
fail to discriminate those two objects because of 
their similar color profile. However background 
classifier will continue to classify the fixed object as 
background and generate low probability of belong- 
ing to a moving object and will help to discriminate 
the tracked object and the fixed one. In contrast, 
when two moving objects come near, background 
classifier will generate high probability of belong- 
ing to a moving object for both. However this time 
the classifier that tries to separate the object and 
the outer area will discriminate two objects as long 
as their color profiles are different. 

The quest at this point is how to combine the 
outputs of these two classifiers. Obviously a simple 
averaging is not feasible since in some situations ei- 
ther classifier may produce very unintended results. 
We come up with a dynamic scheme that calculates 
weights (A) for the classifiers at each frame and uses 
these weights on the next frame to calculate their 
combined output. 



w, = xwl + (1 - X)Wf 



(10) 



We handle the outputs of the classifiers as sepa- 
rate weight values for the weight images that can 
potentially be used for mean-shift tracking. Nat- 
urally their weighted combination calculated with 
Equation (jlOp is the final weight image that we 
use for our classifier combination based mean-shift 
tracking. The calculation of A values rely on the 
idea that these weights are actually a measure of 
how confident each classifier is. Although it is not 
indicated explicity in Equation (|10p. the combina- 
tion parameter A dynamically varies from frame 
to frame. To be able to regard weight images as 
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Figure 4: The ideal histogram H^f^"-' that is defined to be 
histogram of pixels that belong to the object in the ideal 
weight image 
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Figure 5: The ideal histogram -ffo^l"' that is defined to be 
histogram of pixels that are outside of object in the ideal 
weight image 



grayscale images, we scale the weight values in the 
range [0, 1] to integral values in the range [0, 255]. 

To define the measure of confidence, we define 
what a good classifier means; an ideal classifier out- 
puts posterior probability of one for the pixels that 
really belong to the tracked object and zero to oth- 
ers. On top of this definition we define two binned 
histograms that belong to weight images generated 
by these ideal classifiers such that one is concen- 
trated around 255 and other around 0. The his- 
tograms can be seen in Figure |4] and Figure [S] 
where U^^"'^ in Figure H] belongs to object pixels 
in the ideal weight image and H]^^^ in Figure [S] 
belongs to outer pixels in the ideal weight image. 
Grayscale values in these histograms are collected 
in eight bins, so each bin represents 256/8 = 32 
consecutive grayscale levels. 

After ideal histograms are defined, we now turn 
our attention to relate them to the output image 
generated by any classifier. Let a classifier c gener- 



ate a weight image for a frame at time t and H^^^- (t) 
be the histogram of the part of the weight image 
that belongs to the tracked object and H^^^{t) be 
the histogram of the outside pixels in the search 
window. The confidence of the classifier for the 
next frame at time t + 1 may be inferred from the 
similarity of these two generated histograms to the 
ideal histograms. We can interpret histograms as 
discrete signals so we employ signal correlation coef- 
ficient which produces normalized values. The sig- 
nal correlation coefficient between any two signals 
g and h is defined as: 



Y,{g{b)-f,,){h{b) 



fJ-h, 



pig, h) 



(7g<7h 



(11) 



where b denotes bins of the histograms (of the 
weight images) and fi and a arc mean and standart 
deviation of the relevant histogram, p changes be- 
tween — 1 and -t- 1 and higher values mean that two 
histograms are similar. We can use p, to measure 
the similarity of classifier generated histograms to 
ideal histograms and confidence G of a classifier can 
be defined as the sum of correlations between two 
generated histograms (for object and outside) and 
two ideal histograms (again for object and outside): 



G,(t + 1) = p(i/o%(i), ^^fcf ) + PiHLtit),Kir)- 

(12) 
Since we use two classifiers (c = 1, 2), A in Equa- 
tion pH)) can be calculated as the ratio of the con- 
fidence of the two classifiers: 



A = Gi/(Gi + G2). 



(13) 



In summary, after tracking is finished on a frame 
at time t, A value for the next frame (at time t+1) is 
calculated using Equation ([T^ and Equation ([T51) . 
On the next frame, outputs of classifiers are com- 
bined using this A value as shown in Equation ([TUl) . 
An overall summary of the steps of the proposed 
classifier combination system is shown in Figure [B] 

5. Experiments and Results 

To demonstrate the proposed approach, we have 
selected some objects from the PETS 2001 ^ 
database and performed tracking on them. At ev- 
ery frame we have trained or updated an AdaBoost 
classifier using pixels that belong to the object and 



outside pixels. Also we have trained a background 
classifier as mentioned in Section |31 

At each frame we have generated weight image 
values using outputs of the classifiers. We have ap- 
plied classifier combination and found a final weight 
image which is the weighted combination of the pre- 
vious ones. Contribution of each weight image to 
the final one is calculated using the proposed ap- 
proach in Sectional 

For the mean-shift tracking we have used the 
Camshift [6| extension, which adaptivcly resizes the 
search window at each frame. The amount of shift 
and resize of the search window is determined from 
the generated final weight image. 

To compare results, we have performed tracking 
using outputs of both classifiers independently and 
their combined output as well. This way we have 
been able to see scenarios where independent clas- 
sifiers fail and combined approach succeeds. As ar- 
gued in Section|4l AdaBoost classifier failed in situ- 
ations where tracked object passes near a fixed ob- 
ject with similar color value, an example of which 
can be seen in Figure [T] In that figure it can be 
clearly seen that the weight image generated by the 
AdaBoost classifier assigns high values to pixels be- 
longing to the fixed object since its color is very 
similar to the tracked object. However background 
classifier can perform separation perfectly, since the 
pixels belonging to the fixed object currently have 
color values that are most likely to be generated by 
the background model. 

In contrast. Figure \E\ shows another situation 
where two moving objects come together. This time 
the background classifier fails to separate objects, 
because the pixels of both objects are assigned high 
values in the weight image since their current color 
values arc not likely to be generated by the back- 
ground model. However this time AdaBoost classi- 
fier can achieve much better separation, since color 
values of both objects are different and classifier 
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Figure 6: Overall summary of the proposed tracking system; 
matching colors indicate related steps 





— r^ 


I 


4 



Figure 7: A sample tracking scene (upper left), weight image from AdaBoost classifier (upper right), weight image from 
background classifier (lower left), weight image from proposed classifier combination (lower right) 
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Figure 8: A sample tracking scene (upper left), weight image from AdaBoost classifier (upper right), weight image from 
background classifier (lower left), weight image from proposed classifier combination (lower right) 



assigns low values to the pixels of the un-trackcd 
object. 

It can be seen that generated final weight images 
are very suitable for mean shift tracking in both of 
the situations. Since these final weight images are 
generated by weighted combinations of the other 
two, where their contributions are dynamically cal- 
culated, it can be said that the proposed approach 
succeeds to establish a proper classifier combination 
scheme. 

To present the results of tracking for different 
objects, a measure [l6| of tracking performance has 
been used. This measure defines the success ratio 
of the tracking at a single frame by: 



SR 



Ar n At 

Ar U A, 



(14) 



where A^ is the real area of the tracked object 
(the handmarked ground truth) and At is the area 
of the object found by the tracker. This ratio 
changes between and 1 where 1 means that the 
tracker performed a perfect job and tracked the ob- 



ject by including all pixels of it and nothing more. 
On the contrary means, the tracker failed to track 
the object altogether. 

In Table [T] we present tracking success ratios of 
trackers for different objects, where we also show 
situations that tracker has failed to track the ob- 
ject until the end with a * mark. The trackers 
are initialized using ground truth rectangles of the 
objects after they are fully visible in the frame. 
Also, videos that show tracking process using sep- 
arate classifiers and their fusion can be found on 
Jittp :// students . sabanciuniv. edu/ ^ 
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In Table [T] it can be seen that in all of the situ- 
ations where individual classifiers fail to track the 
object until the end, the tracker obtained with our 
fusion approach can track the object successfully. 
In addition, even in the cases where both classifiers 
can succeed individually, the measure presented in 
Equation (J14l) has always higher values with the 
tracker obtained with our fusion approach. 



Table 1: Tracking success ratios for different object and 
trackers 



Object 


AdaBoost 
Only 


Backgrnd. 
Only 


Classifier 
Comb. 


Red Coat 
Female 


9.62* 


42.05* 


52.85 


Blue Car 
Peuegot 


64.34 


58.94 


75.91 


White 
Van 


65.79 


46.04 


74.72 


Left Entry 
Male 


23.62* 


33.51 


47.60 


Female Blue 
Skirt 


1.93* 


36.66 


52.29 



6. Conclusion 

As can be seen from numeric results, weight 
images generated using the classifier combination 
scheme give the best tracking results and even suc- 
ceeds in situations where individual classifiers fail 
when they are used alone. The reasons why in- 
dividual classifiers may fail and how the dynamic 
combination scheme can overcome these can also be 
seen on the supplied sample figures and in videos. 
Dynamic and non-parametric calculation of classi- 
fier contributions and its positive effect on tracking 
emphasizes the robustness of the proposed scheme. 

In this work we have employed only RGB color 
values of the pixels, however more complex but bet- 
ter classifiers may be achieved by employing other 
scene information such as local texture. Addition- 
ally the search window has been taken as a sim- 
ple rectangular area in this work, however a search 
window that is adaptive to the actual shape of the 
object may also increase the recognition accuracy. 
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